Clinical validation and utility of Percepta GSC for the evaluation of lung cancer

The Percepta Genomic Sequencing Classifier (GSC) was developed to up-classify as well as down-classify the risk of malignancy for lung lesions when bronchoscopy is non-diagnostic. We evaluated the performance of Percepta GSC in risk re-classification of indeterminate lung lesions. This multicenter study included individuals who currently or formerly smoked undergoing bronchoscopy for suspected lung cancer from the AEGIS I/ II cohorts and the Percepta Registry. The classifier was measured in normal-appearing bronchial epithelium from bronchial brushings. The sensitivity, specificity, and predictive values were calculated using predefined thresholds. The ability of the classifier to decrease unnecessary invasive procedures was estimated. A set of 412 patients were included in the validation (prevalence of malignancy was 39.6%). Overall, 29% of intermediate-risk lung lesions were down-classified to low-risk with a 91.0% negative predictive value (NPV) and 12.2% of intermediate-risk lesions were up-classified to high-risk with a 65.4% positive predictive value (PPV). In addition, 54.5% of low-risk lesions were down-classified to very low risk with >99% NPV and 27.3% of high-risk lesions were up-classified to very high risk with a 91.5% PPV. If the classifier results were used in nodule management, 50% of patients with benign lesions and 29% of patients with malignant lesions undergoing additional invasive procedures could have avoided these procedures. The Percepta GSC is highly accurate as both a rule-out and rule-in test. This high accuracy of risk re-classification may lead to improved management of lung lesions.

The first generation Percepta Bronchial Genomic Classifier (BGC), developed using microarray-based gene expression technology, assesses cancer-associated gene expression in cytologically normal-appearing bronchial epithelial cells in the mainstem bronchus [21][22][23]. This classifier was designed to be a "rule-out" test for intermediate-risk patients with high sensitivity to detect malignancy, allowing their pre-test cancer risk to be re-classified to low (post-test) risk with a non-diagnostic bronchoscopy and a negative test result. Accuracy of the classifier was validated in two large observational multicenter studies, and the utility of the classifier was shown in a multicenter "real world" study [24][25][26]. In this "real world" prospective multicenter study, a negative Percepta result impacted clinical decision making by down-classifying a significant number of patients with an intermediate risk of malignancy, resulting in a change in clinical management from an invasive procedure to surveillance of the lung lesion [25].
The Percepta Genomic Sequencing Classifier (GSC) is an enhanced second-generation classifier prospectively developed using a testing platform with richer genomic features from whole transcriptome RNA sequencing in combination with clinical factors [27,28]. Analytical validation of Percepta GSC showed reproducibility of the test results in several clinical conditions, including sample collection, storage, shipping, and laboratory processing attesting to the robustness of the classifier across several technical variables [28]. In addition, the Percepta GSC was developed with multiple thresholds allowing it to serve as both a "rule-in" test and a "rule-out" test, thereby increasing its potential utility in improving risk stratification [27]. This study was designed to clinically validate the accuracy of the Percepta GSC in patients with lung lesions who had a non-diagnostic bronchoscopy.

Study design
Patients with an indeterminate lung lesion who had a non-diagnostic bronchoscopy from three different cohorts were evaluated for inclusion. The Airway Epithelium Gene Expression in the Diagnosis of Lung Cancer cohorts (AEGIS I and II) were recruited from multicenter prospective observational studies. Participants were included from 24 centers in the United States, Canada, and Ireland (S1 Fig) if they currently or formerly smoked and were undergoing bronchoscopy to evaluate lung nodules. The Percepta Registry cohort was a multicenter prospective registry that included patients with lung nodules who underwent clinically indicated diagnostic bronchoscopy at 34 medical centers across the US (S2 Fig). Each institution obtained institutional review board (IRB) (Percepta Registry: WIRB# 20151039 and the AEGIS cohorts: IRB #20090312) approval before enrollment, and informed consent was obtained from all patients. Two bronchial brushings were performed during bronchoscopy, and mRNA was collected from bronchial epithelial cells from the right mainstem bronchus. Before bronchoscopy, physicians assessed the risk of malignancy (ROM) for each patient, designated as low (<10%), intermediate (10-60%), or high (>60%) [5]. Study personnel recorded lesion characteristics from the site radiologist report at each institution. In addition, baseline demographic data, smoking history, and subsequent procedures including bronchoscopy, transthoracic needle aspiration/ biopsy, and surgical procedures were recorded. All patients were followed for at least 12 months after bronchoscopy unless a diagnosis of malignancy was confirmed.

Patient selection
Patients from the AEGIS cohorts and the Percepta Registry were randomly split into training and validation cohorts (S1 and S2 Figs). The previously described algorithm development process was restricted to the training cohort [27]. The algorithm development team was blinded to the validation cohort. Exclusion criteria included age < 21 years old, inability to provide informed consent, lack of tobacco use (smoked < 100 cigarettes), or prior or concurrent cancer history. All patients underwent an adjudication process described below, to determine if the leson was benign or malignant. Forty-five patients from the Percepta Registry who underwent adjudication and had stable imaging after 12 months but did not yet have a confirmed diagnosis were labeled "clinically benign" and excluded from the calculation of sensitivity and specificity of the Percepta GSC validation performance as they did not have individual truth labels. However, given the concern for significant bias of overestimation of cancer prevalence, these "clinically benign" lesions were included in calculating cancer prevalence.

Adjudication of diagnoses (Benign versus malignant lesions)
Diagnosis of a benign or malignant lesion was determined through an adjudication process. For the Percepta Registry Cohort, a live adjudication process was conducted to arbitrate a benign, malignant, or inconclusive consensus diagnosis by an expert 3-member pulmonologist panel. (HJL, DFK, LY). Panel members were provided with de-identified patient information with at least 12 months of follow-up. Members of the panel were blinded to the Percepta GSC results.
A benign diagnosis was assigned in cases with 1) resolution of the lesion; 2) an alternative benign diagnosis; 3) lesion stability for � 12 months and determination by the panel that the patient has no further suspicion of malignancy. Although two-year stability for radiographic imaging of lesions is recommended, this study included one-year follow-up of the lesion based upon prior studies that have found one-year nodule stability to be predictive of stability at two years [1,2]. A malignant diagnosis was assigned in cases with a pathology report confirming malignancy or a decision to treat a patient with stereotactic body radiation therapy (SBRT) without tissue confirmation.
To enhance confidence in the adjudication process, a subset of adjudicated patients underwent a second blinded independent central review by two independent oncologists with adjudication by a third oncologist, if needed. Reviewers were provided with the same clinical information as provided in the first adjudication process. Results were 95% concordant (Cohen's kappa = 0.88); therefore data from the first adjudication was used for analysis.
The adjudication process for the AEGIS I and II cohorts was performed as previously described [24].

Evaluation of the validation performance and other statistical analysis
This independent validation set comprised 412 patients with lesions that included low, intermediate, and high pre-test ROM. Descriptive statistics are reported for clinical demographic data by cohort, with differences between cohorts tested with the chi-square test for categorical variables and Wilcoxon rank test for continuous variables. All confidence intervals are twosided 95% unless otherwise noted. Statistical analyses were performed in R (version 3.2.3, https://www.r-project.org). Performance of the classifier was first assessed with pre-specified cut-offs and associated 2x2 tables to calculate sensitivity, specificity, NPV, and PPV. Performance was also assessed independently of thresholds utilizing a receiver operating curve (ROC) and calculating the area under the curve (AUC). The ROC provided a comprehensive evaluation of the Percepta GSC classifier performance across all three cohorts and in different pre-test ROM groups. (S4 Fig).

Estimation of the potential impact of Percepta GSC on clinical management of indeterminate lung lesions
The potential impact of Percepta GSC on the number of invasive diagnostic procedures was assessed for patients in the AEGIS I and II cohorts. Patients in the Percepta Registry were excluded from this analysis due to the inclusion of the Percepta classifier in the clinical management of these lesions. The number of invasive procedures in patients with pre-test low-or intermediate-risk benign lesions who were down-classified by the Percepta GSC was calculated to determine the procedures that could have been avoided if the classifier result had been utilized during lesion management. Likewise, patients with pre-test intermediate or high-risk Stage I or II malignant lesions who were up-classified by Percepta GSC and had an intervening diagnostic procedure or underwent surveillance prior to a definitive surgical procedure were assessed to determine if intervening procedures could have been avoided, thereby enabling an earlier diagnosis of malignancy.

Clinical study population and lesion characteristics
Four hundred twelve patients from the AEGIS cohorts (I and II) (246 patients) and the Percepta Registry (166 patients) were included in the validation cohort for the Percepta GSC (Table 1, S1 and S2 Figs). The most common histological types of cancer were adenocarcinoma (51%) followed by squamous cell (22%) lung cancer. Percentages are calculated within each study cohort, i.e. AEGIS, and the Percepta Registry, respectively; for sub-level breakdowns, i.e. cancer histologic subtype and benign condition, the denominator is the sub-group count. � Infiltrates are pulmonary lesions with ill-defined margins and a diameter that cannot be accurately defined. �� Clinically benign did not have an adjudicated diagnosis but were included in the analysis for cancer prevalence to prevent an over-estimate. A subset of patients in the validation set was identified as having a diagnosis of chronic obstructive pulmonary disease (COPD) based upon the clinical expertise of the investigator at the time of enrollment. In addition to the overall accuracy assessment, the accuracy of the Percepta GSC was assessed for patients with and without COPD. The sensitivity in those with COPD was slightly higher and the specificity slightly lower than those without COPD (S3 Table).

Performance of Percepta GSC in indeterminate lesions stratified by risk of malignancy
We compared the overall performance of the Percepta GSC using a Receiver Operating Curve (ROC) to provide a comprehensive evaluation of the classifier performance independent of the cut-offs in all three cohorts. We found that the overall performance of the Percepta GSC was similar in the AEGIS I and II cohorts compared to the Percepta Registry with an overall AUC of 0.73 (CI: 68.3-78.4), highlighting the robustness of the classifier performance across different patient cohorts (S4 Table and Table 3). Eight of these 15 patients underwent a collective ten surgical procedures. Of the 75 intermediate or high-risk malignant lesions diagnosed as Stage I or II lung cancer, 52 of these patients underwent an intervening diagnostic (not staging) procedure or surveillance CT scan before a surgical resection or ablative therapy. Fifteen (29%) of these 52 patients were up-classified by Percepta GSC and could have avoided an intervening diagnostic procedure if managed according to the Percepta GSC result (Table 4). Overall, 37 intermediate or high-risk malignant lesions were up-classified by Percepta GSC. Of these, 29 (78%) patients were diagnosed with cancer >14 days after the bronchoscopy (turn around time for Percepta GSC is < 14 days) with a median time to diagnosis of 36 days and a maximum time to diagnosis of 739 days. Approximately 1/3 of patients were diagnosed two months after the initial bronchoscopy (S6 Fig). Two patients with malignant lesions who were down-classified from intermediate to low ROM (falsely down-classified) underwent surveillance CT scans or a diagnostic PET scan prior to a diagnosis of malignancy (average time to diagnosis was 8 ± 2 months), suggesting that if the false negative classifier result had been known, it might not have impacted the time to diagnosis. Five patients with benign lesions who were up-classified from intermediate to high ROM underwent six additional procedures, including two surgical resections, one of which occurred 12 days after the bronchoscopy, suggesting that knowledge of the false-positive Percepta GSC result may not have affected the clinical decision to undergo surgical resection.

Discussion
In this large, three cohort clinical validation study of the second generation Percepta lung nodule classifier, Percepta GSC, the accuracy of the classifier was validated in an independent sample set. High sensitivity with modest specificity for the rule-out portion of the classifier and high specificity with modest sensitivity for the rule-in portion was confirmed. By accurately down-classifying and up-classifying a portion of those with indeterminate lung lesions and a non-diagnostic bronchoscopy, the classifier may influence later management decisions to the patient's benefit.
With comprehensive machine learning, Percepta GSC can extract relevant genomic signals from transcriptomic sequencing and provide an accurate risk stratification for patients with a non-diagnostic bronchoscopy, one of the most challenging groups in the management of lung lesions. The classifier captures genomic alterations in bronchial epithelial cells collected during bronchoscopy with minimally invasive bronchial brushings [27,28]. Based upon the airway "field of injury" concept that has been previously validated, the classifier could quantify the impact of such genomic alterations on cancer risk, therefore successfully distinguishing malignant nodules from benign lesions [21,22].
The down-classification feature of the classifier enables a reduction in the risk of malignancy with a negative result. In contrast, a positive result confirms the pre-test risk assessment and management decisions. Similarly, the up-classification feature enables an increase in the risk of malignancy with a positive result, while a negative result would confirm pre-test risk assessment and management decisions. Therefore, a portion of those tested will have a test result that could change pre-test clinical management decisions, and a portion will confirm the pre-test management approach.
For those patients with an intermediate pre-test risk lung lesion and a non-diagnostic bronchoscopy, the classifier may be used to down-classify the risk, making the clinician more comfortable with surveillance of the lesion, or to up-classify the risk suggesting additional testing or treatment is warranted. In the population studied within this risk group, the observed sensitivity of 90.6% and specificity of 37.3% for those down-classified led to an actionable negative result in 29.4% of those tested with a ratio of true negative to false-negative results of 10:1. Thus if the test result led to surveillance imaging, ten patients with benign lesions might have avoided further testing, while one patient with a malignant lesion may have had further evaluation delayed. Similarly, for the group whose risk was up-classified from intermediate to high the specificity was 94.1%, with a sensitivity of 28.3%. The observed actionable positive result was 12.2% of those tested with a ratio of true positive to false positive of 1.9:1. Thus if the test result led to more aggressive testing or treatment, approximately two patients with malignant lesions would proceed to additional invasive testing or treatment while one patient with a benign lesion would do the same. Overall, 41.6% of patients with intermediate-risk lesions and non-diagnostic bronchoscopies were classified as a lower or higher risk group.
The ability to risk stratify lesions with low and high pre-test probability of malignancy may lead to greater clinician or patient confidence with management choices. The test characteristics suggest that a negative result from the rule-out feature of the classifier may downgrade the risk of a patient with a low probability lesion and a positive result from the rule-in feature of the classifier may upgrade the risk of a patient with a high probability nodule. In the population studied, 54.5% of low-risk nodules were down-classified to very low risk without any false negatives observed. In comparison, 27.3% of high-risk nodules were up-classified to very high risk with a ratio of true positives to false positives of 12:1. Thus if it resulted in further aggressive therapy, approximately 12 patients with a malignant lesion would be referred for an additional invasive procedure, whereas one patient with a benign lesion would also undergo the same. When the classifier is used across categories of risk (low, intermediate, and high), 39.1% of test results will classify the patient to a category of risk that is different from their pre-test risk category.
Percepta GSC is an enhanced second-generation classifier that demonstrates improved performance characteristics compared to the first-generation BGC. For down-classifying intermediate risk patients, the GSC and BGC performance is similar overall; the point estimate of the NPV for GSC (91.0% The comparison of test accuracy between those with and without COPD provides interesting insight into the nature of the classifier and the field of injury concept. In general, the classifier had a higher sensitivity and lower specificity in those with COPD, whether used as a rulein or rule-out test. This may suggest biological overlap between genomic changes and COPD and lung cancer clinical features. This knowledge may further increase confidence in negative results in a COPD patient and positive results in those without COPD.
Overall, 27.5% of patients with low-and intermediate-risk lesions that were benign underwent further invasive procedures after a non-diagnostic bronchoscopy. Our study showed that down-classification by Percepta GSC can reduce invasive procedures by 50% in this population. Importantly, patients with nodules that are down-classified by Percepta GSC should be managed according to the guideline recommendation for continued surveillance imaging until the nodule is ascertained to be benign. Given that there may be a small percentage of falsely down-classified malignant lesions, surveillance must be continued until a diagnosis is made or a sufficient duration of follow-up has occurred to confirm benign status. Additionally, the up-classification of intermediate-and high-risk malignant lesions by Percepta GSC would have decreased unnecessary diagnostic procedures in approximately 30% of these patients with the potential for an earlier diagnosis. While these results are promising, they are an estimate of Percepta GSC's impact on clinical management and, therefore, will need to be confirmed in a "real world" clinical setting. Additional studies will directly answer how often Percepta test results change management decisions, as these decisions are heavily influenced by local treatment patterns and patient values and comorbidities.
Strengths of the study include use of three large, independent multicenter cohorts, which included patients from different types of clinical practices across several geographical locations to assess clinical accuracy metrics of the Percepta GSC. Additionally, the classifier was validated using a locked classifier after completion of algorithm development and technical validation phases. This updated classifier extends the range of potential utility by adding a rule-in component to the test for patients with a pre-test intermediate-risk lung lesion. Finally, this clinical validation of the Percepta GSC was performed in patients with a non-diagnostic bronchoscopy, thus reflecting the population where the test will have potential utility.
Limitations of the results include the adjudication process where follow-up was only required to be 12 months to determine benign status. This may have contributed to the inability to adjudicate a diagnosis for 45 patients (not included in the sensitivity and specificity metrics but used to estimate prevalence assuming benignity). Thus a few indolent lung cancers could have been present, and the true prevalence of malignancy may have been slightly higher. It is unclear whether identifying indolent malignancies would impact the utility of the classifier, as surveillance of indolent malignancies is less likely to influence outcomes. Additional prospective clinical utility studies would be helpful to further establish the benefits and performance of the classifier in real-world settings.
As is true with all risk of malignancy prediction models, shifts from one risk category to another are based on negative and positive predictive values, the calculation of which requires the prevalence of malignancy within those risk groups. This study utilized three independent cohorts to establish cancer prevalence at each risk level; however, prevalence may vary in an individual clinical practice. To assist with the application of the test, we provided figures showing post-test probabilities across a range of pre-test probabilities in the supplement, assuming consistent sensitivity and specificity across all pre-test ROMs (S3A-S3D Fig).
Finally, results presented here describe the evaluation of the clinical validity of Percepta GSC and are not intended to provide a comprehensive assessment of clinical utility. Clinician decisions are based not only on the probability of malignancy but also on the accuracy and safety of available testing, patient comorbidities, and patient preferences. This clinical validation study confirmed the accuracy of the Percepta GSC, showing high sensitivity for the ruleout portion of the classifier and high specificity for the rule-in portion of the classifier. Use of the classifier could impact clinical decisions in up to 40% of patients with lung lesions and indeterminate results from bronchoscopy. Further assessment of clinical utility is warranted. different pre-test cancer prevalence in patients who are classified from low to very low risk with specificity of 57.4% and sensitivity of 100%. The prevalence of lung cancer with and without these 45 clinically benign patients was 5.0% and 5.6% in the low pre-test ROM group, respectively. b) Negative predictive value (NPV) of the Percepta GSC across different pre-test cancer prevalence in patients who are classified from intermediate to low risk with specificity of 37.3% and sensitivity of 90.6%. The prevalence of lung cancer with and without these 45 clinically benign patients was 28

S4 Fig. Percepta GSC performance in the AEGIS I and II and Percepta Registry cohorts. a)
Comparison of the receiver operator curve (ROC) of the Percepta GSC in all study patients in the AEGIS I and II cohorts and the Percepta Registry. b) Comparison of the receiver operator curve (ROC) of the Percepta GSC in the low and intermediate risk of malignancy study patients in the AEGIS I and II cohorts and the Percepta Registry. The asterisk on each curve corresponds to the sensitivity/specificity pair at the decision boundary where patients with scores above the decision boundary will maintain their risk of malignancy; and patients with scores below the decision boundary will have their risk of malignancy down-classified (i.e. low to very low and intermediate to low). c) Comparison of the receiver operator curve (ROC) of the Percepta GSC in the intermediate risk of malignancy study patients in the AEGIS I and II cohorts and the Percepta Registry. The asterisk on each curve corresponds to the sensitivity/specificity pair at the decision boundary where patients with scores above the decision boundary will have their risk malignancy up-classified from intermediate to high; and patients with scores below the decision boundary will have their risk of malignancy stay as intermediate. d) Comparison of the receiver operator curve (ROC) of the Percepta GSC in the high risk of malignancy study patients in the AEGIS I and II cohorts and the Percepta Registry. The asterisk on each curve corresponds to the sensitivity/specificity pair at the decision boundary where patients with scores above the decision boundary will have their risk malignancy up-classified from high to very high; and patients with scores below the decision boundary will have their risk of malignancy stay as high.